Development of a Corpus Workbench for the METU Turkish Corpus
نویسندگان
چکیده
We will introduce a corpus workbench designed and implemented for the METU Turkish Corpus. The workbench design introduces a number of useful features and the workbench itself is basically usable with any TEI and XML compliant corpus, provided that it can be indexed in the format required by the workbench.
منابع مشابه
METU Turkish Discourse Bank Browser
In this paper, the METU Turkish Discourse Bank Browser, a tool developed for browsing the annotated annotated discourse relations in Middle East Technical University (METU) Turkish Discourse Bank (TDB) project is presented. The tool provides both a clear interface for browsing the annotated corpus and a wide range of search options to analyze the annotations.
متن کاملA Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus
This paper describes first steps towards extending the METU Turkish Corpus from a sentence-level language resource to a discourse-level resource by annotating its discourse connectives and their arguments. The project is based on the same principles as the Penn Discourse TreeBank (http://www.seas.upenn.edu/~pdtb) and is supported by TUBITAK, The Scientific and Technological Research Council of ...
متن کاملMarmara Turkish Coreference Corpus and Coreference Resolution Baseline
We describe the Marmara Turkish Coreference Corpus, which is an annotation of the whole METU-Sabanci Turkish Treebank with mentions and coreference chains. Collecting nine or more independent annotations for each document allowed for fully automatic adjudication. We provide a baseline system for Turkish mention detection and coreference resolution and evaluate it on the corpus.
متن کاملOn developing new text and audio corpora and speech recognition tools for the turkish language
This paper describes recent work towards development of new corpora and tools for Turkish speech research. This effort represents an on-going collaboration between the Center for Spoken Language Research (CSLR) at the University of Colorado and the Department of Electrical Engineering at the Middle East Technical University (METU). A new text corpus developed from Turkish newspapers’ text is de...
متن کاملUse of Lexical Statistics for Compound Word Recognition and Segmentation in Turkish
Compound words are cross-linguistic morphological phenomena that occur in all languages. Compound words are widely accepted to be stored in the lexicon but their constituents need to be accessed during both language learning and production processes. In this study, the use of corpora was investigated for how to differentiate single-stem words from single-word compounds and then how to segment c...
متن کامل